Introduction

I did a search on pixabay for the term: fighter jets. I have always been fascinated by aircraft and especially military aircraft. So, I took the opportunity to look at some cool pictures of planes.

I noticed some features that seemed to generally apply to the photos:

photo_data %>% 
  select(`Picture Links` = pageURL) %>% 
  knitr::kable()
Picture Links
https://pixabay.com/photos/jet-fighter-jet-raaf-hornets-2974131/
https://pixabay.com/photos/fighter-jet-f-15-strike-eagle-63090/
https://pixabay.com/photos/jet-ships-sea-flying-ocean-6234694/
https://pixabay.com/photos/us-navy-military-aviation-aircraft-8008429/
https://pixabay.com/photos/clouds-dramatic-clouds-fighter-jet-251293/
https://pixabay.com/photos/f-18-super-hornet-aircraft-carrier-70499/
https://pixabay.com/photos/fighter-jet-flight-air-force-viper-7503437/
https://pixabay.com/photos/plane-spaceship-hunter-fighter-jet-7332865/
https://pixabay.com/photos/jet-engine-fighter-war-fire-2515044/
https://pixabay.com/photos/plane-aircraft-3d-rendered-5568577/
https://pixabay.com/photos/aeroplane-aircraft-aviation-clouds-1851133/
https://pixabay.com/photos/f-16-thunderbird-aircraft-aviation-2678839/
https://pixabay.com/photos/plane-air-force-fighter-jet-7718500/
https://pixabay.com/photos/f-a-18-hornet-take-off-usn-2549648/
https://pixabay.com/photos/armed-forces-fighter-jet-panavia-4295760/
https://pixabay.com/photos/fighter-aircraft-saab-jas-39-gripen-7375086/
https://pixabay.com/photos/saab-jas-39-gripen-fighter-jet-7503419/
https://pixabay.com/photos/reno-airshow-airplanes-air-show-175520/
https://pixabay.com/photos/aircraft-fighter-cockpit-instrument-1367289/
https://pixabay.com/photos/boeing-f-a-18-hornet-fighter-jet-6735759/
https://pixabay.com/photos/jet-military-aircraft-airplane-war-5113118/
https://pixabay.com/photos/airbus-fighter-jet-fighter-plane-7452648/
https://pixabay.com/photos/aviation-jet-transportation-engine-7041015/
https://pixabay.com/photos/f-a-18-hornet-usmc-2657086/
https://pixabay.com/photos/panavia-tornado-fighter-jet-7503417/
https://pixabay.com/photos/fighter-aircraft-saab-jas-39-gripen-7375064/
https://pixabay.com/photos/plane-military-jet-flew-mig29-3329223/
https://pixabay.com/photos/royal-australian-air-force-4735625/
https://pixabay.com/photos/fighter-jet-wing-pilot-g-suit-7061734/
https://pixabay.com/photos/jet-engine-fighter-war-fire-2515039/
https://pixabay.com/photos/plane-jet-military-flew-3329229/
https://pixabay.com/photos/meeting-airshow-plane-military-3633535/
https://pixabay.com/photos/starfighter-jet-jet-fighter-plane-5214941/
https://pixabay.com/photos/eurofighter-super-moon-5016209/
https://pixabay.com/photos/patrol-suisse-f-5-e-tiger-7448051/
https://pixabay.com/photos/f-18-aircraft-flight-6604524/
https://pixabay.com/photos/jet-jet-fighter-plane-fighter-jet-5311077/
https://pixabay.com/photos/plane-fighter-jet-jet-flying-4735748/
https://pixabay.com/photos/propellers-airplane-vertical-starter-266704/
https://pixabay.com/photos/stealth-bombers-aircraft-7038292/
https://pixabay.com/photos/formation-flight-fujiyama-mount-fuji-62887/
https://pixabay.com/photos/armed-forces-fighter-jet-panavia-4295762/
https://pixabay.com/photos/petrol-equipment-fire-extinguisher-3079094/
https://pixabay.com/photos/f-22-raptor-stealth-aircraft-jet-2663597/
https://pixabay.com/photos/airplane-military-aircraft-jet-3338398/
https://pixabay.com/photos/saab-jas-39-gripen-fighter-jet-7485936/
https://pixabay.com/photos/uss-nimitz-ships-aircraft-carrier-108937/
https://pixabay.com/photos/great-britain-england-britain-pilot-69583/
https://pixabay.com/photos/f-16-fighting-falcon-air-force-2679331/
https://pixabay.com/photos/general-electric-j79-jet-turbine-1367294/
https://pixabay.com/photos/blue-angels-jets-navy-military-sky-946754/
https://pixabay.com/photos/sky-usaf-airplane-aviation-jet-4449724/
https://pixabay.com/photos/plane-jet-u-s-navy-f-a-18c-sky-108935/
https://pixabay.com/photos/plane-airfield-military-f18-3329219/
https://pixabay.com/photos/plane-jet-military-flew-f16-3329238/

Key Features of Selected Photos

What was the range of number of views for these photos?

# Range of view counts
view_range <- photo_data %>% 
  pull(views) %>% 
  range()

Before we analysed the data we observed that the photos had a wide variety of view counts and upon inspection of the data we find that from our selected photos, the one with the least views had 340 views and the one with the most views had 103783 views.

How many different users are associated with these photos?

# Number of users
num_users <- photo_data %>% 
  pull(user) %>% 
  unique() %>% 
  length()

Each photo in pixabay has a user associated with it and our selection of photos are associated with 30 different users.

How likely is a photo to be downloaded after it is viewed?

# Middle 75% of download view percentage
mid_75_down_view_percentage <- photo_data %>% 
  pull(download_view_proportion) %>% 
  quantile(c(0.125, 0.875)) * 100

From my investigation I found that the percentage of people who downloaded a photo after viewing it was between 45.3% and 75.8% for 75% of the photos.

Creativity

To demonstrate creativity I had a look at what tags were used to describe the selected photos. Here is a table showing some information about the 25 most popular tags.

# Separate tags into different rows
tags <- photo_data %>% 
  separate_rows(tags, sep = ", ")

# Interesting table
tags_table <- tags %>% 
  group_by(tags) %>% 
  summarise(tag_count = n(),
            mean_views = mean(views),
            mean_downloads = mean(downloads),
            mean_likes = mean(likes),
            largest = paste0(round(max(imageSize) / 1e6, 2), "MB")) %>% 
  arrange(desc(tag_count))

tags_table %>% 
  rename(Tags = tags, Count = tag_count, `Mean Views` = mean_views, `Mean Downloads` = mean_downloads, `Mean Likes` = mean_likes, `Largest Photo Size` = largest) %>% 
  slice(1:25) %>% 
  knitr::kable()
Tags Count Mean Views Mean Downloads Mean Likes Largest Photo Size
fighter jet 14 13322.6429 8066.4286 32.071429 3.27MB
jet 13 12515.1538 8453.2308 33.615385 2.91MB
plane 11 1553.2727 841.9091 8.818182 3.28MB
aircraft 10 5592.2000 3559.2000 14.900000 3.23MB
military 7 1821.8571 1088.1429 8.428571 2.91MB
air force 4 709.5000 451.2500 5.000000 2.64MB
aviation 4 5107.2500 3231.7500 16.500000 2.75MB
saab jas 39 gripen 4 534.7500 348.7500 3.500000 3MB
airplane 3 4669.6667 2072.0000 13.000000 1.67MB
fighter 3 8936.3333 4672.3333 19.666667 3.23MB
fighter plane 3 21316.3333 9852.6667 48.000000 3.12MB
flight 3 869.6667 609.6667 6.000000 2.63MB
armed forces 2 3215.0000 2245.5000 19.000000 3.27MB
engine 2 5041.5000 2004.0000 13.500000 2.86MB
f a-18 hornet 2 8069.0000 3279.0000 19.500000 1.64MB
f-18 2 12979.0000 7154.5000 25.500000 2.12MB
fighter aircraft 2 572.0000 374.5000 3.000000 3MB
jet fighter 2 659.5000 397.0000 8.000000 1.83MB
panavia 2 3215.0000 2245.5000 19.000000 3.27MB
ships 2 21009.5000 14728.5000 77.500000 2.82MB
stealth 2 6536.0000 4474.0000 34.000000 1.63MB
3d rendered 1 2654.0000 1888.0000 13.000000 2.57MB
aeroplane 1 8015.0000 4179.0000 14.000000 2.13MB
aerospace 1 1831.0000 1121.0000 10.000000 2.41MB
air show 1 38689.0000 18654.0000 71.000000 1.67MB

To give a better overview of the variety of tags used I revisited the ggwordcloud package I used in Project 2.

# Create word cloud
tags_table %>% 
  ggplot() +
  geom_text_wordcloud(aes(label = tags, size = tag_count, colour = tag_count)) +
  scale_size_area(max_size = 13) +
  theme_minimal()

I believe that I demonstrated creativity by using the separate_rows function to separate the tags information into separate rows. I then used this information to create a table with interesting information about each of the 25 most popular tags. I also investigated the ggwordcloud package to create an interesting visual to represent more information about the use of tags to describe the chosen photos.

I would like to disclose that I reused the css code form my previous projects, although I did add some formatting to the images in this project that was not present in previous projects.

Learning Reflection

I really enjoyed this module and learned about a lot of important ideas. Firstly, I really liked learning about APIs and using them to obtain data. APIs make it easy to obtain relevant and current data for analysis. We also learned about JSON data and how to work with it. Finally, I developed more skills with regards to the manipulation of data frames to get the information that I want from them.

I am really interested to learn more about different data sources and expand my knowledge of APIs. This would allow me to easily gather data to answer questions that I might have.

Appendix

# Load packages and data
library(tidyverse)
library(jsonlite)
library(magick)
library(ggwordcloud)
json_data <- fromJSON("pixabay_data.json")
pixabay_photo_data <- json_data$hits

# Filter and select some of the photos
# Get list of column names
names(pixabay_photo_data)

# Keep only some variables, create new variables, and filter images to reduce number
# download_view_proportion is the proportion of people who downloaded an image after viewing it
# popularity ranks images logarithmically into 4 categories according to views
# A popular user is onkelglocke so is_onkelglocke checks if an image's user is onkelglocke
# We filter first by imageWidth and imageHeight to keep only photos of better than 4k resolution
# Since there were still to many photos we filtered to keep only half of the images with imageSize less than the median
selected_photos <- pixabay_photo_data %>% 
  select(previewURL, pageURL, tags, imageWidth, imageHeight, imageSize, views, downloads, collections, likes, comments, user) %>% 
  mutate(download_view_proportion = downloads / views,
         popularity = ifelse(views >= 1e5, "Very Popular",
                             ifelse(views < 1e5 & views >= 1e4, "Popular",
                                    ifelse(views < 1e4 & views >= 1e3, "Average",
                                           "Unpopular"))),
         is_onkelglocke = (user == "onkelglocke")) %>% 
  filter(imageWidth >= 3840 &
           imageHeight >= 2160) %>% 
  filter(imageSize < median(imageSize))

# Save new dataframe as csv file
write_csv(selected_photos, "selected_photos.csv")

# List most popular users for selected photos
selected_photos %>% 
  group_by(user) %>% 
  summarise(n()) %>% 
  arrange(desc(`n()`))

# Data exploration and summary values
glimpse(selected_photos)

# Some exploratory plots
selected_photos %>% 
  ggplot() +
  geom_point(aes(x = download_view_proportion, y = likes,), 
             colour = "#50c4f2")

selected_photos %>% 
  ggplot() +
  geom_point(aes(x = is_onkelglocke, y = download_view_proportion), 
             colour = "#50c4f2")

selected_photos %>% 
  ggplot() +
  geom_point(aes(x = popularity, y = download_view_proportion), 
             colour = "#50c4f2")

selected_photos %>% 
  ggplot() +
  geom_point(aes(x = collections, y = log(views)), 
             colour = "#50c4f2")

selected_photos %>% 
  ggplot() +
  geom_point(aes(x = views, y = likes), 
             colour = "#50c4f2")


# mean likes score (adjusted for views) by popularity
# Do higher popularity photos (logarithmically according to views) get more likes once we account for the effect of views?
selected_photos %>% 
  mutate(like_adj = likes / views * 1e4) %>% 
  group_by(popularity) %>% 
  summarise(mean_likes = mean(like_adj))
# No, there does not seem to be such an effect

# Range of views
view_range <- selected_photos %>% 
  pull(views) %>% 
  range()

# Number of different users
num_users <- selected_photos %>% 
  pull(user) %>% 
  unique() %>% 
  length()

# Number of photos by onkelglocke
selected_photos %>% 
  pull(is_onkelglocke) %>% 
  sum()

# 75% of the selected photos have a percentage of downloads per views between these values
mid_75_down_view_percentage <- selected_photos %>% 
  pull(download_view_proportion) %>% 
  quantile(c(0.125, 0.875)) * 100 %>% 
  round()

# Create animated GIF
photos <- selected_photos %>% 
  pull(previewURL) %>% 
  image_read()

animated_photos <- image_animate(photos, fps = 1)

animated_photos %>% 
  image_write("my_photos.gif")

# Tags Analysis
# Separate tags into different rows
tags <- selected_photos %>% 
  separate_rows(tags, sep = ", ")

# Arrange tags according to how frequently they are used
popular_tags <- tags %>% 
  group_by(tags) %>% 
  summarise(cnt = n()) %>% 
  arrange(desc(cnt)) %>% 
  pull(tags)

# Five most popular tags
popular_tags[1:5]

# How many unique tags
length(popular_tags)

# Bar chart of the five most popular tags
tags %>% 
  filter(tags %in% popular_tags[1:5]) %>% 
  ggplot() +
  geom_bar(aes(x = tags), fill = "#50c4f2")

# Interesting table
tags_table <- tags %>% 
  group_by(tags) %>% 
  summarise(tag_count = n(),
            mean_views = mean(views),
            mean_downloads = mean(downloads),
            mean_likes = mean(likes),
            largest = paste0(round(max(imageSize) / 1e6, 2), "MB")) %>% 
  arrange(desc(tag_count))

# Try word cloud
tags_table %>% 
  ggplot() +
  geom_text_wordcloud(aes(label = tags, size = tag_count, colour = tag_count)) +
  scale_size_area(max_size = 6) +
  theme_minimal()